Search CORE

21 research outputs found

A location-aware embedding technique for accurate landmark recognition

Author: Bidgoli Navid Mahmoudian
Magliani Federico
Prati Andrea
Publication venue
Publication date: 01/01/2017
Field of study

The current state of the research in landmark recognition highlights the good accuracy which can be achieved by embedding techniques, such as Fisher vector and VLAD. All these techniques do not exploit spatial information, i.e. consider all the features and the corresponding descriptors without embedding their location in the image. This paper presents a new variant of the well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique which accounts, at a certain degree, for the location of features. The driving motivation comes from the observation that, usually, the most interesting part of an image (e.g., the landmark to be recognized) is almost at the center of the image, while the features at the borders are irrelevant features which do no depend on the landmark. The proposed variant, called locVLAD (location-aware VLAD), computes the mean of the two global descriptors: the VLAD executed on the entire original image, and the one computed on a cropped image which removes a certain percentage of the image borders. This simple variant shows an accuracy greater than the existing state-of-the-art approach. Experiments are conducted on two public datasets (ZuBuD and Holidays) which are used both for training and testing. Morever a more balanced version of ZuBuD is proposed.Comment: 6 pages, 5 figures, ICDSC 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Compression pour la communication interactive de contenus visuels

Author: Mahmoudian Bidgoli Navid
Publication venue: HAL CCSD
Publication date: 29/11/2019
Field of study

Interactive images and videos have received increasing attention due to the interesting features they provide. With these contents, users can navigate within the content and explore the scene from the viewpoint they desire. The characteristics of these media make their compression very challenging. On the one hand, the data is captured in high resolution (very large) to experience a real sense of immersion. On the other hand, the user requests a small portion of the content during navigation. This requires two characteristics: efficient compression of data by exploiting redundancies within the content (to lower the storage cost), and random access ability to extract part of the compressed stream requested by the user (to lower the transmission rate). Classical compression schemes can not handle random accessibility because they use a fixed pre-defined order of sources to capture redundancies.The purpose of this thesis is to provide new tools for interactive compression schemes of images. For that, as the first contribution, we propose an evaluation framework by which we can compare different image/video interactive compression schemes. Moreover, former theoretical studies show that random accessibility can be achieved using incremental codes with the same transmission cost as non-interactive schemes and with reasonable storage overhead. Our second contribution is to build a generic coding scheme that can deal with various interactive media. Using this generic coder, we then propose compression tools for 360-degree images and 3D model texture maps with random access ability to extract the requested part. We also propose new representations for these modalities. Finally, we study the effect of model selection on the compression rates of these interactive coders.Les images et vidéos interactives ont récemment vu croître leur popularité. En effet, avec ce type de contenu, les utilisateurs peuvent naviguer dans la scène et changer librement de point de vue. Les caractéristiques de ces supports posent de nouveaux défis pour la compression. D'une part, les données sont capturées en très haute résolution pour obtenir un réel sentiment d'immersion. D'autre part, seule une petite partie du contenu est visualisée par l'utilisateur lors de sa navigation. Cela induit deux caractéristiques : une compression efficace des données en exploitant les redondances au sein du contenu (pour réduire les coûts de stockage) et une compression avec accès aléatoire pour extraire la partie du flux compressé demandée par l'utilisateur (pour réduire le débit de transmission). Les schémas classiques de compression ne peuvent gérer de manière optimale l’accès aléatoire, car ils utilisent un ordre de traitement des données fixe et prédéfini qui ne peut s'adapter à la navigation de l'utilisateur.Le but de cette thèse est de fournir de nouveaux outils pour les schémas interactifs de compression d’images. Pour cela, comme première contribution, nous proposons un cadre d’évaluation permettant de comparer différents schémas interactifs de compression d'image / vidéo. En outre, des études théoriques antérieures ont montré que l’accès aléatoire peut être obtenu à l’aide de codes incrémentaux présentant le même coût de transmission que les schémas non interactifs au prix d'une faible augmentation du coût de stockage. Notre deuxième contribution consiste à créer un schéma de codage générique pouvant s'appliquer à divers supports interactifs. À l'aide de ce codeur générique, nous proposons ensuite des outils de compression pour deux modalités d'images interactives : les images omnidirectionnelles (360 degrés) et les cartes de texture de modèle 3D. Nous proposons également de nouvelles représentations de ces modalités. Enfin, nous étudions l’effet de la sélection du modèle sur les taux de compression de ces codeurs interactifs

INRIA a CCSD electronic archive server

Rate-distortion optimized motion estimation for on-the-sphere compression of 360 videos

Author: Mahmoudian Bidgoli Navid
Marie Alban
Maugey Thomas
Roumy Aline
Publication venue: HAL CCSD
Publication date: 06/06/2021
Field of study

International audienceOn-the-sphere compression of omnidirectional videos is a very promising approach. First, it saves computational complexity as it avoids to project the sphere onto a 2D map, as classically done. Second, and more importantly, it allows to achieve a better rate-distortion tradeoff, since neither the visual data nor its domain of definition are distorted. In this paper, the on-the-sphere compression [1] for omnidirectional still images is extended to videos. We first propose a complete review of existing spherical motion models. Then we propose a new one called tangent-linear+t. We finally propose a rate-distortion optimized algorithm to locally choose the best motion model for efficient motion estimation/compensation. For that purpose, we additionally propose a finer search pattern, called spherical-uniform, for the motion parameters, which leads to a more accurate block prediction. The novel algorithm leads to rate-distortion gains compared to methods based on a unique motion model

INRIA a CCSD electronic archive server

Méthode d'apprentissage opérant sur la sphère: Application à la compression d'images omnidirectionnelles

Author: Azevedo Roberto
Frossard Pascal
Mahmoudian Bidgoli Navid
Maugey Thomas
Roumy Aline
Publication venue: HAL CCSD
Publication date: 06/09/2022
Field of study

National audienceThe growing popularity of 360◦ images implies a need of advanced tools for their representation and processing. In order tocircumvent the problems due to the spherical topology, most of the existing methods use planar representation of the sphere, unfortunately providing an irregular pixel distribution. In this paper, we propose a set of tools working directly on the sphere, enabling the development of advanced learning methods. We illustrate the benefits of the proposed approach in a compression application.La popularité croissante pour les images 360 • implique un besoin fort en outils avancés pour leur représentation et leur traitement. Pour contourner les difficultés liées à la topologie sphérique, la plupart des méthodes existantes utilisent des représentations planes de la sphère, occasionnant malheureusement des irrégularités dans la distribution spatiale du signal visuel. Dans cet article, nous proposons une boîte à outils opérant directement sur la sphère, permettant ainsi le développement de méthodes avancées d'apprentissage. Nous illustrons les bénéfices de l'approche proposée dans une application de compression

INRIA a CCSD electronic archive server

OSLO: On-the-Sphere Learning for Omnidirectional images and its application to 360-degree image compression

Author: Azevedo Roberto
Frossard Pascal
Mahmoudian Bidgoli Navid
Maugey Thomas
Roumy Aline
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

International audienceState-of-the-art 2D image compression schemes rely on the power of convolutional neural networks (CNNs). Although CNNs offer promising perspectives for 2D image compression, extending such models to omnidirectional images is not straightforward. First, omnidirectional images have specific spatial and statistical properties that can not be fully captured by current CNN models. Second, basic mathematical operations composing a CNN architecture, e.g., translation and sampling, are not welldefined on the sphere. In this paper, we study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. In particular, we: i) propose the definition of a new convolution operation on the sphere that keeps the high expressiveness and the low complexity of a classical 2D convolution; ii) adapt standard CNN techniques such as stride, iterative aggregation, and pixel shuffling to the spherical domain; and then iii) apply our new framework to the task of omnidirectional image compression. Our experiments show that our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images. Also, compared to learning models based on graph convolutional networks, our solution supports more expressive filters that can preserve high frequencies and provide a better perceptual quality of the compressed images. Such results demonstrate the efficiency of the proposed framework, which opens new research venues for other omnidirectional vision tasks to be effectively implemented on the sphere manifold

INRIA a CCSD electronic archive server

Compression pour la communication interactive de contenus visuels

Author: Mahmoudian Bidgoli Navid
Publication venue: HAL CCSD
Publication date: 29/11/2019
Field of study

Interactive images and videos have received increasing attention due to the interesting features they provide. With these contents, users can navigate within the content and explore the scene from the viewpoint they desire. The characteristics of these media make their compression very challenging. On the one hand, the data is captured in high resolution (very large) to experience a real sense of immersion. On the other hand, the user requests a small portion of the content during navigation. This requires two characteristics: efficient compression of data by exploiting redundancies within the content (to lower the storage cost), and random access ability to extract part of the compressed stream requested by the user (to lower the transmission rate). Classical compression schemes can not handle random accessibility because they use a fixed pre-defined order of sources to capture redundancies. The purpose of this thesis is to provide new tools for interactive compression schemes of images. For that, as the first contribution, we propose an evaluation framework by which we can compare different image/video interactive compression schemes. Moreover, former theoretical studies show that random accessibility can be achieved using incremental codes with the same transmission cost as non-interactive schemes and with reasonable storage overhead. Our second contribution is to build a generic coding scheme that can deal with various interactive media. Using this generic coder, we then propose compression tools for 360-degree images and 3D model texture maps with random access ability to extract the requested part. We also propose new representations for these modalities. Finally, we study the effect of model selection on the compression rates of these interactive coders.Les images et vidéos interactives ont récemment vu croître leur popularité. En effet, avec ce type de contenu, les utilisateurs peuvent naviguer dans la scène et changer librement de point de vue. Les caractéristiques de ces supports posent de nouveaux défis pour la compression. D'une part, les données sont capturées en très haute résolution pour obtenir un réel sentiment d'immersion. D'autre part, seule une petite partie du contenu est visualisée par l'utilisateur lors de sa navigation. Cela induit deux caractéristiques : une compression efficace des données en exploitant les redondances au sein du contenu (pour réduire les coûts de stockage) et une compression avec accès aléatoire pour extraire la partie du flux compressé demandée par l'utilisateur (pour réduire le débit de transmission). Les schémas classiques de compression ne peuvent gérer de manière optimale l’accès aléatoire, car ils utilisent un ordre de traitement des données fixe et prédéfini qui ne peut s'adapter à la navigation de l'utilisateur. Le but de cette thèse est de fournir de nouveaux outils pour les schémas interactifs de compression d’images. Pour cela, comme première contribution, nous proposons un cadre d’évaluation permettant de comparer différents schémas interactifs de compression d'image / vidéo. En outre, des études théoriques antérieures ont montré que l’accès aléatoire peut être obtenu à l’aide de codes incrémentaux présentant le même coût de transmission que les schémas non interactifs au prix d'une faible augmentation du coût de stockage. Notre deuxième contribution consiste à créer un schéma de codage générique pouvant s'appliquer à divers supports interactifs. À l'aide de ce codeur générique, nous proposons ensuite des outils de compression pour deux modalités d'images interactives : les images omnidirectionnelles (360 degrés) et les cartes de texture de modèle 3D. Nous proposons également de nouvelles représentations de ces modalités. Enfin, nous étudions l’effet de la sélection du modèle sur les taux de compression de ces codeurs interactifs

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL Descartes

Theses.fr

HAL-Rennes 1

Excess rate for model selection in interactive compression using Belief-propagation decoding

Author: Mahmoudian Bidgoli Navid
Maugey Thomas
Roumy Aline
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/10/2020
Field of study

International audienceInteractive compression refers to the problem of compressing data while sending only the part requested by the user. In this context, the challenge is to perform the extraction in the compressed domain directly. Theoretical results exist, but they assume that the true distribution is known. In practical scenarios instead, the distribution must be estimated. In this paper, we first formulate the model selection problem for interactive compression and show that it requires to estimate the excess rate incurred by mismatched decoding. Then, we propose a new expression to evaluate the excess rate of mismatched decoding in a practical case of interest: when the decoder is the belief-propagation algorithm. We also propose a novel experimental setup to validate this closed-form formula. We show a good match for practical interactive compression schemes based on fixed-length Low-Density Parity-Check (LDPC) codes. This new formula is of great importance to perform model and rate selection

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Correlation model selection for interactive video communication

Author: Mahmoudian Bidgoli Navid
Maugey Thomas
Roumy Aline
Publication venue: HAL CCSD
Publication date: 17/09/2017
Field of study

International audienceInteractive video communication has been recently proposed for multi-view videos. In this scheme, the server has to store the views as compact as possible, while being able to transmit them independently to the users, who are allowed to navigate interactively among the views, hence requesting a subset of them. To achieve this goal, the compression must be done using a model-based coding in which the correlation between the predicted view generated on the user side and the original view has to be modeled by a statistical distribution. In this paper we propose a framework for lossless fixed-length source coding to select a model among a candidate set of models that incurs the lowest extra rate cost to the system. Moreover, in cases where the depth image is available, we provide a method to estimate the correlation model

INRIA a CCSD electronic archive server

HAL-Rennes 1

Evaluation framework for 360-degree visual content compression with user view-dependent transmission

Author: Mahmoudian Bidgoli Navid
Maugey Thomas
Roumy Aline
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/09/2019
Field of study